Picture for Pang Wei Koh

Pang Wei Koh

ReasonIR: Training Retrievers for Reasoning Tasks

Add code
Apr 29, 2025
Viaarxiv icon

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

Add code
Apr 28, 2025
Viaarxiv icon

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

Add code
Apr 20, 2025
Viaarxiv icon

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Add code
Apr 15, 2025
Viaarxiv icon

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Add code
Apr 09, 2025
Viaarxiv icon

EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees

Add code
Mar 11, 2025
Viaarxiv icon

Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions

Add code
Mar 10, 2025
Viaarxiv icon

Large-Scale Data Selection for Instruction Tuning

Add code
Mar 03, 2025
Viaarxiv icon

S4S: Solving for a Diffusion Model Solver

Add code
Feb 24, 2025
Viaarxiv icon

ICONS: Influence Consensus for Vision-Language Data Selection

Add code
Jan 06, 2025
Figure 1 for ICONS: Influence Consensus for Vision-Language Data Selection
Figure 2 for ICONS: Influence Consensus for Vision-Language Data Selection
Figure 3 for ICONS: Influence Consensus for Vision-Language Data Selection
Figure 4 for ICONS: Influence Consensus for Vision-Language Data Selection
Viaarxiv icon